4 research outputs found

    Multicriteria Decision Making for Carbon Dioxide (CO2) Emission Reduction

    Get PDF
    The fast industrial revolution all over the world has increased emission of carbon dioxide (CO2), which has badly affected the atmosphere. Main sources of CO2 emission include vehicles and factories, which use oil, gas, and coal. Similarly, due to the increased mobility of automobiles, CO2 emission increases day-by-day. Roughly, 40% of the world’s total CO2 emission is due to the use of personal cars on busy and congested roads, which burn more fuel. In addition to this, the unavailability of parking in all parts of the cities and the use of conventional methods for searching parking areas have added more to this problem. To solve the problem of reducing CO2 emission, a novel cloud-based smart parking methodology is proposed. This methodology enables drivers to automatically search for nearest parking(s) and recommend the most preferred ones that have empty lots. For determining preferences, the methodology uses the analytical hierarchy process (AHP) of multicriteria decision-making methods. For aggregating the decisions, the weighted sum model (WSM) is adopted. The methods of sorting, multilevel multifeatures filtering, exploratory data analysis (EDA), and weighted sum model (WSM) are used for ranking parking areas and recommending top-k parking to the drivers for parking their cars. To implement the methodology, a scenario comprising cars, smart parkings are considered. To use EDA, a freely available dataset “2020testcar-2020-03-03” is used for the estimation of CO2 emitted by cars. For evaluation purpose, the results obtained are compared with the results of traditional approach. The comparison results show that the proposed methodology outperforms the traditional approach

    Algorithm selection using edge ML and case-based reasoning

    Get PDF
    In practical data mining, a wide range of classification algorithms is employed for prediction tasks. However, selecting the best algorithm poses a challenging task for machine learning practitioners and experts, primarily due to the inherent variability in the characteristics of classification problems, referred to as datasets, and the unpredictable performance of these algorithms. Dataset characteristics are quantified in terms of meta-features, while classifier performance is evaluated using various performance metrics. The assessment of classifiers through empirical methods across multiple classification datasets, while considering multiple performance metrics, presents a computationally expensive and time-consuming obstacle in the pursuit of selecting the optimal algorithm. Furthermore, the scarcity of sufficient training data, denoted by dimensions representing the number of datasets and the feature space described by meta-feature perspectives, adds further complexity to the process of algorithm selection using classical machine learning methods. This research paper presents an integrated framework called eML-CBR that combines edge edge-ML and case-based reasoning methodologies to accurately address the algorithm selection problem. It adapts a multi-level, multi-view case-based reasoning methodology, considering data from diverse feature dimensions and the algorithms from multiple performance aspects, that distributes computations to both cloud edges and centralized nodes. On the edge, the first-level reasoning employs machine learning methods to recommend a family of classification algorithms, while at the second level, it recommends a list of the top-k algorithms within that family. This list is further refined by an algorithm conflict resolver module. The eML-CBR framework offers a suite of contributions, including integrated algorithm selection, multi-view meta-feature extraction, innovative performance criteria, improved algorithm recommendation, data scarcity mitigation through incremental learning, and an open-source CBR module, reshaping research paradigms. The CBR module, trained on 100 datasets and tested with 52 datasets using 9 decision tree algorithms, achieved an accuracy of 94% for correct classifier recommendations within the top k=3 algorithms, making it highly suitable for practical classification applications

    Large-scale Data Integration Using Graph Probabilistic Dependencies (GPDs)

    No full text
    The diversity and proliferation of Knowledge bases have made data integration one of the key challenges in the data science domain. The imperfect representations of entities, particularly in graphs, add additional challenges in data integration. Graph dependencies (GDs) were investigated in existing studies for the integration and maintenance of data quality on graphs. However, the majority of graphs contain plenty of duplicates with high diversity. Consequently, the existence of dependencies over these graphs becomes highly uncertain. In this paper, we proposed graph probabilistic dependencies (GPDs) to address the issue of uncertainty over these large-scale graphs with a novel class of dependencies for graphs. GPDs can provide a probabilistic explanation for dealing with uncertainty while discovering dependencies over graphs. Furthermore, a case study is provided to verify the correctness of the data integration process based on GPDs. Preliminary results demonstrated the effectiveness of GPDs in terms of reducing redundancies and inconsistencies over the benchmark datasets.Data Science Research Centre (DSRC) at the University of Derb

    A unified graph model based on molecular data binning for disease subtyping.

    No full text
    Molecular disease subtype discovery from omics data is an important research problem in precision medicine.The biggest challenges are the skewed distribution and data variability in the measurements of omics data. These challenges complicate the efficient identification of molecular disease subtypes defined by clinical differences, such as survival. Existing approaches adopt kernels to construct patient similarity graphs from each view through pairwise matching. However, the distance functions used in kernels are unable to utilize the potentially critical information of extreme values and data variability which leads to the lack of robustness. In this paper, a novel robust distance metric (ROMDEX) is proposed to construct similarity graphs for molecular disease subtypes from omics data, which is able to address the data variability and extreme values challenges. The proposed approach is validated on multiple TCGA cancer datasets, and the results are compared with multiple baseline disease subtyping methods. The evaluation of results is based on Kaplan-Meier survival time analysis, which is validated using statistical tests e.g, Cox-proportional hazard (Cox p-value). We reject the null hypothesis that the cohorts have the same hazard, for the P-values less than 0.05. The proposed approach achieved best P-values of 0.00181, 0.00171, and 0.00758 for Gene Expression, DNA Methylation, and MicroRNA data respectively, which shows significant difference in survival between the cohorts. In the results, the proposed approach outperformed the existing state-of-the-art (MRGC, PINS, SNF, Consensus Clustering and Icluster+) disease subtyping approaches on various individual disease views of multiple TCGA datasets
    corecore